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Abstract 

We consider the problem of statistical inference on unknown quantities structured as a multiway 
table. We show that such multiway tables are naturally formed by arranging regression coefficients in 
complex systems of linear models for association analysis. In genetics and genomics, the resulting two- 
way and three-way tables cover many important applications. Within the Bayesian hierarchical model 
framework, we define the structure of a multiway table through prior specification. Focusing on model 
comparison and selection, we derive analytic expressions of Bayes factors and their approximations 
and discuss their theoretical and computational properties. Finally, we demonstrate the strength of 
our approach using a genomic application of mapping tissue-specific eQTLs (expression quantitative 
loci). 



1 Introduction 



In genetics, it is now well known that DNA mutations are widely-spread on genomes (Balding 



their effects on phenotypes may vary largely under different environmental exposures (Hunter 



)); 
)); 

and a single genetic variant may impact multiple phenotypes through gene networks (Hodgkin (1998)). 
To comprehensively understand the role of genetic variants in biological processes, modern-day genetic 
studies have been trending towards designing experiments to interrogate genetic variants genome-wide 
and investigate their associations with multiple phenotypes under various environmental conditions 
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(Dimas et al. (2009)). Ideally, the scientific findings from such experiments should be summarized in 
multiway tables, in which each entry characterizes the magnitude of genetic association (i.e. genetic 
effect) of a particular variant, under a specific environmental condition, with respect to a unique phe- 
notype measurement. In practice, because these quantities are not directly observed, efficient statistical 
inference on full multiway tables becomes an important goal for analyzing genetic data. 

Motivated by genetic applications, this paper discusses the general problem of statistical inference on 
unobserved multiway tables in association analysis. Firstly, we show that multiway tables are naturally 
formulated from a rather general system of linear models by re-arranging relevant regression coefficients. 
This linear system, including commonly used multiple linear regression and multivariate linear regression 
models as special cases, is adequate to address a wide range of genetic applications. It is worth noting 
that our setup differs from most existing work on multiway (tensor) data analysis in a significant 
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way. Most existing approaches (Kolda and Bader (2009), Hoff (2010)) focus on the multi-dimensional 
structure of observed data, as one of the most common goals is to identify their lower-dimensional 
representations. In our setting, multiway structures are built of unobserved regression coefficients and 
our interest is to infer the multiway tables in their pre-defined dimensions. 

In this paper, we emphasize simultaneous inference on multiway tables. In comparison, a naive approach 
would fill out multiway tables one entry at a time (e.g. by fitting separate simple linear regressions) while 
temporarily ignoring the existence of other entries. This method is not only statistically inferior but 
also possibly defies the purpose of the specific study design. For example, in meta-analysis of genetic 
association studies, it is only sensible to jointly analyze genetic associations across all participating 
studies (it is easy to see that the genetic effects of a single variant across multiple studies form a 
one-dimensional array). Although the limitation of the naive method is generally well understood, 
most currently available joint inference approaches only deal with the special case of one-dimensional 
multiway tables. In statistical genetics literature, many variable selection methods, both in Bayesian 
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Frequentist (Hoggart et al. (2008), Wu et al. 



(2009)) domains, have been proposed to identify multiple 
relevant mutations with respect to a single trait. In meta-analysis and detecting GxE interaction, 



single variant association testings across multiple subgroups have been thoroughly studied (Lebrec 



et al. (2010), Han and Eskin (2011), Wen and Stephens (2011)) and widely applied. Methodologies for 
understanding the impacts of genetic variants on multiple phenotypic traits are being developed, but 



most methods available now propose analyzing one single genetic variant at a time (Stephens (2010)). 
The generalization of multiway tables not only includes all above applications as special cases, it also 
enables us to consider a much richer set of important scientific applications. One such example is the 
fine-mapping of genetic associations in a meta-analysis setting, which aims to identify possibly multiple 
phenotype-associated mutations by taking into account linkage disequilibrium (LD) and integrating 
evidence across multiple studies. This problem is most naturally formulated as an inference on a two- 
way table of genetic effects, with one dimension including multiple candidate variants and the other 
dimension consisting of multiple participating studies. 

In this paper, we take a Bayesian hierarchical model approach to define the complex structure of 
multiway tables via prior specification and deal with potential correlations in observed data through 
likelihood functions. This separation makes it conceptually easy to apply the proposed statistical 
framework in context-dependent settings: we argue that the priors on the multiway effects can be 
"generic" depending solely on the scientific inquires of interest, whereas only likelihood functions should 
be adjusted for different experimental designs and sampling procedures. 

Our primary focus in inference is to perform model comparison and selection (with hypothesis testings 
as special cases). In our settings, different candidate multiway tables are characterized by their priors 
and typically non-nested. In the Bayesian framework, comparing non-nested hierarchical models is 
straightforward by utilizing Bayes factors. In the following proceeding sections, we derive Bayes factors 
and their approximations for multiway tables. We discuss both the theoretical and the computational 
properties of our analytic results and demonstrate their efficiency in applications of Bayesian model 
selections. 



2 



2 Linear Systems for Modeling Multiway Tables 



Linear models are effective statistical tools for association analysis. In this section, we describe a general 
linear model system where regression coefficients naturally form multiway tables. 

2.1 System of Simultaneous Multivariate Linear Regressions (SSMR) 

A system of simultaneous multivariate linear regressions (SSMR) consists of a set of s multivariate linear 
regression equations, i.e., 



where "MN" denotes a matrix-variate normal distribution and each linear equation describes one of s 
non-overlapping subgroups of observed data. 

For subgroup i with n« subjects, Y-i is ainijXr matrix with each row representing the r quantitative 
measurements from one subject; = (X c ^ Xg^) is an rij x (qi+p) design matrix, in which X g ^ represents 
the data matrix of p explanatory variables of interest (e.g. genotypes of interrogated genetic variants) 
and X c> i represents the data of qi additional variables (including the intercept) to be controlled for; Bi 



is a (qi + p) x r matrix of regression coefficients and it can be further decomposed into Bi = I c ' 1 I 

V B 9,i ) 

in which matrices B g ^ (p x r) and B c ^ (qi x r) contains the regression coefficients for explanatory and 
controlled variables respectively; finally, Ei is an rij x r matrix of residual errors where each row vector 
is assumed to be independent and identically distributed as N(0, Sj). 

Although the same set of r response variables and p explanatory variables are assumed to be measured in 
all s subgroups, we allow each linear model to control for a different set of covariates. Furthermore, the 
residual errors are assumed independent across subgroups. In addition, we denote Y := {Y\, . . . ,Y S }, 
X := {X±, . . . ,X S } and := {Si,...,S s } (throughout the paper, we refer to H as "residual error 
variances" ) . 

It should be clear that in the SSMR, each B 9t i forms a two-dimensional slice and a three-way table of 
interest for inference can be constructed by joining all s slices. Correspondingly, the resulting three- 
way table can be "unfolded" into a one-dimensional vector, denoted by f3 g , which is mathematically 



Y i = X i B i + E i , Ei ~ MN (0, 1, Sj) , i = l,...,s, 



(2.1) 




convenient to work with. More specifically, we define 




and similarly, j3 c := 



/ vec(i?; i ) \ 




/3 ff ~N(0,W ff ). 



(2.2) 
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The Variance-covariance matrix W g plays a central role in our framework and we defer the detailed 
discussions to section [3j For the regression coefficients of controlled variables, we assume 

P c ~N{0,* c ), (2.3) 

where matrix ^f c is assumed diagonal. When performing inference, we consider the limiting condition 
^ / C T 1 — > 0. Furthermore, we assume 3 g and 3 C are a priori independent. Thus, the vector containing all 

/ \ 

the regression coefficients, defined by 3 sys := ^ ^ J , has the following prior distribution 

/3 sys ~N(o,* c ewg, (2.4) 

where "©" denotes the matrix operation of direct sum. For the residual error variances, we assign an 
independent inverse Wishart prior for each composing Sj, i.e., 

Si ~IW(Hi,mi). (2.5) 

In the special case that r = 1, the composing multivariate linear regressions degenerate to multiple 
linear regressions , accordingly, the univariate version of the inverse Wishart distribution reduces to an 
inverse Gamma distribution. 



2.2 Special Cases and Applications in Genetics 

The SSMR model is a generalization of a class of linear systems (e.g., when s = 1 and r = 1, it becomes 
a multiple linear regression model). Along with its special cases, the SSMR model is capable of handling 
various applications involving most two-way and some three-way tables in applications of genetics and 
genomics. 

• Multivariate Linear Regression (MVLR). When a data set contains only a single subgroup, i.e. 
s = 1, the SSMR model reduces to a single multivariate linear regression (MVLR) model. In case 
that a single set of unrelated individuals are sampled from the population, the MVLR model is 
appropriate for at least two major applications in genetics. The first application studies genetic 
associations of multiple genetic variants with respect to multiple phenotypes; whereas the second 
application investigates genetic associations of multiple genetic variants with a single phenotype 
but in different environmental conditions (i.e., a specific study design for investigating gnen- 
environment interactions). The data application we show in this paper is an example of the latter 
case. 

• System of Simultaneous Linear Regressions (SSLR). This special case arises if each composing 
multivariate linear regression reduces to a multiple linear regression, i.e. r = 1. In association 
studies of multiple genetic variants with a single phenotype, unrelated individual samples can 
form non-overlapping subgroups in the following scenarios. The first scenario arises in meta- 
analyses, where each subgroup represents a participating study. In the second possible scenario, 
gene-environment interaction is of interest and for each environmental condition, a unique subset 
of individuals is sampled. 
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The general SSMR model is also uniquely important for many genetics/genomics applications. One 
such example is the meta-analysis of genetic variants with respect to multiple phenotypes. In this case, 
the point of interest is typically the complete three-way table of genetic effects (although when genetic 
variants are analyzed one at a time, i.e. p = 1, the three-way table degenerates to a two-way table). 



3 Prior Specification for Multiway Tables 

In our Bayesian framework, priors play a vital role in both defining the structure and specifying the 
a priori relationships in a multiway table. Prior distributions complement likelihood functions to fully 
construct hierarchical models which allow information to be efficiently shared across and within different 
dimensions of a multiway table. In this section, we discuss some of the technical considerations in 
specifying prior distributions for (3 g , the vectorized multiway table, and illustrate the strength and the 
modeling flexibility of our Bayesian framework in handling the complex structures of multiway tables. 

Based on the Bayesian principle, in considering priors alone, we argue that the scientific problems in 
hand dictate the specification of priors, whereas other factors like data collection mechanisms should 
have little or no influence. By this reasoning, we conclude that the prior specifications for different 
experimental designs should remain the same, as long as they all aim at the same scientific inquiries. 



For example, as we have shown in section 2.2 for investigating gene-environment interactions, different 
sampling schemes can lead to either the SSLR or the MVLR model; however, this difference should 
not vary our a priori expectation on the potential underlying gene-environment interactions. Also as 
a consequence, the conjugacy of the priors, which is typically linked to some specific form of likelihood 
function, should not be our top consideration at this stage of model construction. 

In this work, we limit ourselves to the class of multivariate normal prior distributions represented by 



(2.2). Thus, a fully specified covariance matrix W g completes the prior distribution. Although this 
approach may appear oversimplified, we show that this class of priors can be indeed very flexible and 
useful in solving a wide range of interesting scientific problems in genetics and genomics. 

It is worth pointing out that the only technical requirement for W g is its positive semi-definiteness. In 
particular, we emphasize that prior covariance matrices can be rank-deficient. The singularity of W g 
matrix directly reflects some linear restrictions on elements in a multiway table and it is extremely useful 
to describe some candidate models residing in a lower-dimensional space. For example, in a fixed-effect 
meta-analysis, effects of a common variable are assumed to be the same across different studies. In 



our framework, this linear restriction can be represented by a singular W g matrix (appendix A.3). In 
another example, to assert a particular entry of a multiway table is exactly 0, we simply zero-out the 
corresponding row and column in the W g matrix, which again leads to the resulting W g being singular. 
It is easy to see, from a generative model point of view, the linear restrictions imposed by such singular 
normal prior distributions are enforced in the posterior distributions through Bayesian computation. 



3.1 Alternative Parametrization of W g 

To start this section, we first formally define the skeleton of a multiway table: 

DEFINITION 1. The skeleton of a multiway table is a binary-valued multi- dimensional array whose 
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layout and structure are identical to the original multiway table. Each entry in the skeleton indicates if 
the corresponding element in the original multiway table is non-zero. 



It should be noted in many model/variable selection and hypothesis testing problems, the skeleton of 
a multiway table, rather than the full table, is of primary interest. In these cases, it is helpful to 
re-parametrize a W g matrix by 



where T g is a binary matrix with the same row and column layout as matrix W g and its entries of 
value 1 mapping the non-zero entries of W g ; and A g = {wij} represents the indexed set of actual values 
of corresponding non-zero entries in W g . Because of the one-to-one mapping between the entries of 
a multiway table and the diagonal elements of the corresponding T g matrix, the information of the 
skeleton is fully conveyed in the T g matrix. In addition, the T g matrix also carries information on a 
priori correlation structure in a multiway table, which is typically also functionally determined by its 
skeleton in a context-dependent manner (we show by examples in sections |3.2[ [5] and appendix |A| . 
Thus, there always exists a bijection (whose exact form is context-dependent) between a skeleton and 
its corresponding T g matrix. 

Furthermore, this formulation yields a principled way to specify a hyper-prior distribution on matrix 



3.2 Example: Prior Specification in Genetic Applications 

This section demonstrates the construction of W g in genetic and genomic applications. In particular, 
the emphasis is on the specifications of matrix T g and the conditional distribution A g | T g for a given 
skeleton in such context. 

In genetic applications of multiway tables, the dimensions that are commonly of interest include 

1. The dimension of genetic variants 

2. The dimension of phenotypes 

3. The dimension of subgroups 

The first two dimensions are straightforward to understand, the subgroup dimension is a generic term 
for which interpretations may vary in different applications: in detecting gene-environment interactions, 
subgroups correspond to different environmental conditions; whereas in meta-analyses, samples from 
different studies form subgroups. In genetic applications where all three dimensions are considered, we 
show that the corresponding W g can be reasonably decomposed into a block diagonal matrix. In brief, 
the decomposition is based on the following prior assumptions: 

1. the genetic effects of different genetic variants are assumed to be independent a priori. 





W g , i.e. 
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2. for a single genetic variant, its effects on different phenotypes are assumed to be independent a 
priori conditional on the very genetic variant has some effects on those phenotypes. 



3. If a variant-phenotype pair is assumed associated, the effects of genetic associations in different 



subgroups can be modeled by the Bayesian prior proposed by Wen and Stephens (2011) to take 
into account potential heterogeneity. 



It should be noticed that all above prior assumptions are originated from previous works on various 
one-dimensional array of genetic effects and they are naturally assembled together. In appendix [SJ 
we give detailed discussions on these assumptions. Furthermore, we note that this example merely 
represents a general starting point for relevant genetic applications: with more specific information 
available, these additional knowledge can be naturally incorporated into our prior constructions in a 



similar way demonstrated by Stingo et al. (2011) and Veyrieras et al. (2008). 



3.3 Scale-invariant Prior Formulation 

In practice, it is often desired that inference results are invariant to scale transformations of response 
variables. In Bayesian analysis of multiple linear regressions, this property can be achieved by scaling 
regression coefficients by the residual standard errors and performing inference on the transformed 



regression coefficients (Servin and Stephens (2007), Wen and Stephens (2011)). Most importantly, the 
prior distribution is assigned to transformed regression coefficients. Here, we extend this recipe to the 
SSMR models. 

Specifically, we scale each element in (3 g by its corresponding residual standard error (in the MVLR, the 
residual standard error for a given regression coefficient is referred to the square root of the corresponding 
diagonal element in the S matrix) . More formally, we define a vector of scale- free standardized effects 
by 



(3.3) 



where S is a diagonal matrix permuted from ®f =1 (I <8> diag(Sj)) to match the order of elements in /3 . 
It is easy to see that the resulting b g is unitless. 

Under this setting, a multivariate normal prior distribution b g ~ N(0, U g ) induces a normal prior 
distribution on j3 g with mean and 



W a 



55 UgS*. 



(3.4) 



With (3.4), we are able to handle the desired scale-invariant prior formulation as a special case of original 



scale formulation. 



4 Bayes Factors for Multiway Tables 

In this section, we derive Bayes factors to facilitate model comparisons and selections for multiway 
tables in the Bayesian framework. At the most fundamental level, Bayes factors enable us to compare 



7 



the supporting evidence from observed data for a set of competitive models (which are not necessarily 
nested). In case that posterior model probabilities are of direct interest, Bayes factors can be typically 
utilized as an intermediate computational device for marginal likelihood. For example, in the Metropolis- 
Hastings algorithm, the Hastings ratio can be evaluated by directly plugging in the Bayes factor of a 
newly proposed versus current models. There, the computational gain is obtained due to the fact that 
Bayes factors pre-process the integrals of most nuisance parameters presented in both (proposed and 
current) models. 

In what follows, we discuss the Bayes factors derived from the most general case of the SSMR model 



assuming the multivariate normal prior (2.2) for a multiway table 



We first give the formal definition of a Bayes factor for a multiway table. Let Hq denote the trivial 
null model where /3 g = 0, then for a multiway table characterized by its prior variance W g , we define a 
null-based Bayes factor (Liang et al. ( 2008| >) as follows 



DEFINITION 2. Under the SSMR model, for a positive definite W g , the Bayes factor is defined as 

For technical reasons, the above definition requires W g to be full-rank, we will extend this definition to 
allow for singular W g matrix later in section 4.1.3 



4.1 Analytical Results on Bayes Factors and Their Analytic Approximations 

In this section, we discuss our main analytic results on Bayes factors for multiway tables based on the 
SSMR model. We start by introducing some necessary additional notations: We use (3 g to denote the 
least square estimate (also the MLE) of (3 g , the vectorized multiway table of interest, and denote its 
variance by V g := Var(/3 9 ). It should be noted that under the SSMR model, both g and V g have 
closed- form expressions: g depends only on observed data X and Y, while V g depends on X and 
(their explicit forms are given in appendix [B|) . 

4.1.1 Exact Bayes Factor with Known Residual Error Variances 

In the general case of the SSMR model, when the residual error variances are considered known, rather 
than being assigned priors, the exact Bayes factor for a multiway table defined by a positive-definite 
W g can be analytically expressed. We summarize this result in the following lemma: 

LEMMA 1. In the SSMR model, if XI is known, the Bayes factor in definition 2 can be analytically 
expressed by 

BF(W g ) = ll + V^Wg]-* -exp Q/^ 1 [W g {I + V" 1 ^)" 1 ] V g ~ 1 $ g ] • (4.2) 



The derivation of lemma 1 is mostly straightforward and we leave the details in appendix B.l 



S 



NOTE 1. Under the setting of lemma 1 and in addition, provided V g is also full-rank, the posterior 
distribution of (3 is given by 

(3 g | Y, X, S ~ N ((y- 1 + W~ 1 )~ 1 V~ 1 g , (V- 1 + H^" 1 )- 1 ) . (4.3) 

The quadratic form inside the exponential function of T5F(W g ) is equivalent to a multivariate Wald 
statistic computed using the mean and variance from the posterior distribution. 

NOTE 2. The Bayes factor naturally deals with potential collinearity in predictors. In particular, the 
evaluation of the Bayes factor does not require the involving design matrices to be full-rank (the details 
are explained in appendix\Cfy. As a result, when highly correlated explanatory variables are included in 
the model, the Bayes factor can still be stably computed. 



Note 2 is extremely important for genetic applications where genotypes of many spatially closed genetic 
variants are often highly correlated. In these situations, the desired Bayes factors can be computed 
straightforwardly without special computational treatments (e.g. Moore-Penrose pseudoinverse) . 



4.1.2 Approximate Bayes factor with Unknown Residual Error Variances 

In more realistic settings, residual error variances are typically unknown and additional integrations 
with respect to X! are necessary for evaluating desired Bayes factors. Except for a few notable special 
cases (e.g. the MVLR model with conjugate priors), the exact Bayes factor generally is analytically 
intractable. 

Alternatively, we apply Laplace's method to pursue analytic approximations of exact Bayes factors. 
Laplace's method has been widely applied in efficient computation of Bayes factors in less complicated 



models (Kass and Raftery (1995), Raftery (1996), DiCiccio et al. (1997), Saville and Herring (2009), 



Wen and Stephens (2011)). In the case of SSMR model, we identify two different applications of 
Laplace's method, from both of which the resulting approximate Bayes factors (ABFs) maintain the 



exact functional form of (4.2), however replace XI with different point estimates. 



The first application of the Laplace's method leads to the use of the following point estimate for each 



1 



Hi + (Yi - XiBiYiY - XiBi 



(4.4) 



and we further denote £ := {Si, . . . , 



In (4.4), Bi denotes the least square estimate of Bi] Hi is 
(recall Sj 



a hyper-parameter from the prior distribution of EJj (recall Sj ~ TW(Hi,m,i)). Note, in the limiting 
situation such that Hi — >• 0, Sj becomes the MLE of £j, and it is also asymptotically equivalent to the 
REML estimate under this setting. 



The relevant quantities in (4.2) that are functionally related to X are V g and potentially W g (e.g. in 



the scale-invariant prior formulation). We denote V g and W g as the corresponding plug-in estimates of 



V g and W g by S. 



We formally summarize the result of the first Laplace's method in the following proposition: 
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PROPOSITION 1. Under the SSMR model, when E is unknown, applying Laplace's method leads to 
the following analytic approximation of the Bayes factor 



ABF(Wg := \I + V- l W g \~l • exp ( [w g (I + V^Wg)' 1 ] V g ~ 1 $ i 



(4.5) 



and 



BF(W g ) = ABF (W g ) J] (1 + 0{nT 1 )) . 

i=l 

Furthermore, provided n\ 3> p and n; > r for i = 1, . . . , s, the approximate Bayes factor converges 
almost surely to the true value, as sample size rii — > oo, Vz. 



Proof. See appendix B.2.1 and proof in B.2.2 



□ 



NOTE 3. Under the conditions considered in the prop ositi on 1, Ej 's can be accurately estimated by 
their MLE or RMLE. For the estimator considered in (4-4) > it follows that Ej ^4' Ej \fi. This fact, 



along with lemma 1, provides intuitive explanation for the proposition. 

An alternative application of Laplace's method results in a different analytic approximation of the Bayes 
factor: each unknown Ej G E is replaced by the following estimate from the null model, 



1 



J?; 



Ej — — I Hi + (Yi — X c iB c A (Yi — X Cy iB c 



(4.6) 



accordingly, we denote E := {Ei, . . . , E s }. More specifically, B c ^ is the least square estimate by fitting 
the trivial null model, i.e. restricting B g ^ = 0. By plugging E, we obtain corresponding estimates of V g 
and W g . 

The result of the second approximation is formally summarized in below: 

PROPOSITION 2. In the SSMR model, when E is unknown, an alternative application of Laplace's 
method leads to the following first-order analytic approximation of the Bayes factor 

1 A/ 



and 



ABF*(W 9 ) := \I + V g l Wg\-^ ■ exp ( -0'gV' 1 [w g (I + Vf x W g )- x \ V g 1 ^ 



m(w g ) = km\w g ) • n (i + o(n^)) . 

i=i 



(4.7) 



Proof. See derivation in appendix |B.2.3 



□ 



NOTE 4. Because g can also be represented as an analytic function of B c , the MLE under the null 
model, computing ABF*(W g ) essentially only requires fitting the trivial null model , which is computa- 



tionally attractive (details are given in appendix B.2.3). 



Although the two different analytic approximations have the same order of the asymptotic error bounds, 
in finite sample situations, we expect the good accuracy from ABF* (W g ) only if the true model deviates 
slightly from the null model. This scenario, in most genetic applications, is typically expected, as most 
known genetic variants have only very small effects with respect to most phenotypes. 
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4.1.3 Singular Prior Distribution 



So far, our results on Bayes factors and their approximations all require W g to be full-rank. To extend 
the definition of Bayes factor for singular W g , we first define 

W*(\)=W g + \I, A>0, (4.8) 

where W g is only required positive semi-definite. By this definition, W g (X) is guaranteed full-rank. 
With (4.8), we are now able to extend our definition of Bayes factor for the SSMR model to include 
singular W g matrix: 

DEFINITION 3. Under the SSMR model, for a positive semi-definite W g , the Bayes factor is defined 
as 

BF(W g ) = lim BF (wJ(A)) . (4.9) 



The definition is based on the following important intuition: Bayes factors are expected to vary very 
smoothly over a continuum of models. This is not only desired but also critically important for selecting 
model consistently using Bayes factors. 

We have the following result regarding to the existence of the limit: 

PROPOSITION 3. For the SSMR model, the limiting Bayes factors in definition 3 always exist and 
well-defined, provided that W g is positive semi-definite. 



In case that S is known, the proof of proposition 2 is trivial, by noting the the result from lemma 1 
does not involve direct inverse of W g matrix and the matrix sum (/ + V~ l W g ) is always full-rank. If I] 
is unknown, the arguments are a little involved and we give the full proof for this case in appendix |D} 

In case of unknown S, Laplace's method can still be applied to derive analytic approximations of the 
Bayes factor. Briefly, in case of singular W g matrix, the result for ABF* is unchanged; whereas for 
ABF, we adjust Laplace's method to take into account the linear restrictions imposed by W g matrix. 
This strategy yields a more accurate approximation by using a better estimate of X. Nonetheless, the 
functional form of ABF remains intact. We describe the relevant technical details in appendix [Dj 

With these results, we have extended lemma 1 and propositions 1 and 2 to allow for singular W g matrix. 

4.2 Connections to Frequentist Test Statistics and BIC 
4.2.1 Connection to Multivariate Test Statistics 

Under the following prior assumptions: 

1. W g = cV g , where c is a positive scalar constant 

2. V g is full-rank 
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3. Sj ~ TW(Hi, mi) under the limiting conditions, Hi — )■ 0, for i = 1, . . . , s 



the resulting prior bears a similarity to Zellner's g-prior (Zellner (1986), Liang et al. (2008)). It can be 
shown that, under this setting, 



and 



ABF(W S ) 



ABF*(W S ) 



1 



rps 



1 + C 



1 + c 



• exp 



1 + c 



^~wald 



rps 



■ exp 



2 1 + c 



(4.10) 



(4.11) 



In particular, T wa id and T score represent the multivariate Wald statistic and the Rao's score statistic 
respectively, both of which are computed based on the SSMR model for testing Hq : (3 g = 0. Obtaining 



(4.10) is straightforward and we give the details for (4.11) in appendix E.l 



The important consequence of this connection is that, under this specific form of priors, approximate 
Bayes factors and the corresponding Frequentist test statistics yield the same rankings for a set of 
candidate models. 



Previously, Wakefield (2009), Johnson (2005, 2008) have identified similar connections between (approx- 



imate) Bayes factors and the i-statistic in simple linear regression models. Wen and Stephens| (2011) 



extends their results in a meta-analysis setting. The results we presented here are more general and 
include all previous findings as special cases. 

Albeit the connections, we do not advocate the use of these test statistics as model comparison devices. 
Especially, caution should be taken when interpreting this prior in specific contexts: for example, 



Wakefield (2009) and Wen and Stephens (2011) have shown some undesired implications of this prior 



in genetic applications (e.g., \W g \ is inversely proportionally to sample sizes). 



4.2.2 Connection to BIC 



Under the suitable conditions, we show (in appendix E.2) that the derived Bayes factor and its approx- 



imations can be further approximated by Bayesian Information Criterion (BIC, Schwarz ( |1978 )): Let 
L\ and Lq denote the likelihoods computed from the target model (characterized by W g ) and the null 
model respectively, it follows that 



log BF(W S ) = (log L x - log L ) - ^ ]T log m + 0(1 



(4.12) 



BIC is asymptotically consistent, meaning that as sample size increases to infinity and under other 
suitable conditions, BIC selects the fixed true model among a finite set of candidates with probability 1 



(Haughton (1988), Schwarz (1978)). Consequently, by (4.12), our Bayes factor and its approximations 



also enjoy this asymptotic consistency property. 

It is worth pointing out that BIC is not a universal approximation of Bayes factors. In our case, BIC 



(4.12) fails to approximate desired Bayes factors with the advocated error bound in the following two 



noticeable scenarios: 
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Wg or V g is singular. Intuitively, in this case, linear constraints on parameter space would change 
the way that "free" parameters are counted. Nonetheless, it is usually possible to resolve the 
linear constraints by transformation and re-parametrization. 



2. W g is some function of sample sizes. An interesting example is the prior we studied in the previous 



section, W n 



cV g , in which 



Wo | becomes inversely proportional to sample sizes. For an arbitrary 



fixed constant c, it is easy to see BIC fails to approximate (4.10) and (4.11). This phenomenon is 



closely related to the "information paradox" (Liang et al. (2008])) and we give detailed explanations 
in appendix |E.2 



It is also clear from the derivation that in high-dimensional settings where the condition rij ^> r,p,s 
is not satisfied for some i, then BIC becomes poor approximations to the target (approximate) Bayes 
factors, i.e. the O(l) error bound is no longer maintained. 



4.3 Extension to Non-normal Data 



Although we have been working exclusively on quantitative response variables for which normal dis- 
tributions are assumed, under certain conditions, our results can be extended to non-normal response 
variables. 

Suppose that multiway tables are modeled in a complex system of generalized linear models within the 
exponential family framework, furthermore, the MLE of the system can be numerically computed for the 
vectorized effects /3 sys . Following the standard asymptotic maximum likelihood theory, the likelihood 
of the system can be approximated by a quadratic expansion around its maximum likelihood estimate. 
This can be equivalently expressed by the following asymptotic approximation, 

P sys I Psy S ~ N (/3 sys , Var(/3 sys )) , (4.13) 

where Var(/3 sys ) is typically approximated using observed Fisher information. Combining with the prior 
distribution 



/3 sys ~N(0,* £ 



W, 



9)1 



(4.14) 



it is then straightforward to show that the resulting Bayes factor in this setting maintains the same 



functional form as (4.2) 



For a system of generalized linear models comparable to the SSLR model, (4.13) is straightforward to 



obtain, similarly to what has been shown by Wen and Stephens (2011). However, challenges remain 



for finding an appropriate likelihood function to describe correlated non-normal response data in a 



situation mimicking the MVLR model: for correlated binary data, probit model proposed by Chib and 



Greenberg (1998) seems a natural fit; however for other general data types, solutions are not generally 
readily available. 
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4.4 Bayes factors for Skeletons of Multiway Tables 



The Bayes factor of the skeleton of a multiway table, characterized by T g , in the SSMR model can be 
computed from BF(l^t / g ) based on the formulation of (3.1), i.e., 



BF(r £ 



p(A g \T g )BF(Wg)dA ; 



(4.15) 



This formulation provides us the necessary statistical instrument to perform model comparisons in the 
space of skeletons of multiway tables. In genetic applications, it is often desired to model p(A g \ T g ) by 
a finite discrete distribution (Stephens and Balding (2009)). In that case, the integration in ( 4. 15[ ) is 
replaced by a summation. 



5 Data Application 

In this section, we demonstrate our statistical approaches in mapping tissue-specific eQTLs. eQTLs 
(expression Quantitative Loci) are genetic variants associated with gene expression levels and they pro- 
vide biological insights on interplays of genetic variants and gene regulation processes. It also has been 
widely known that gene regulatory mechanisms may vary significantly in different cell environments. 
This phenomenon can be indirectly observed by inspecting inconsistent association patterns of relevant 
eQTLs in different cell types: a putative eQTL may only show associations in some specific cell types 
when certain regulatory mechanism is active, whereas the very mechanism becomes inactive in some 
other environments (e.g. prohibited by some regulatory machinery), the association signal is lost. 



The data set that we use is originally published by Dimas et al. (2009). In their experiment, the 
investigators genotyped around 500,000 single nucleotide polymorphisms (SNPs) genome-wide in 85 
unrelated western European individuals (75 remained after controlling for potential population stratifi- 
cation). Expression levels from this set of individuals were measured genome- wide in primary fibroblasts, 
Epstein-Barr virus-immortalized B cells (lymphoblastoid cell lines or LCLs), and T cells. The expres- 
sion data went through quality controls and normalization steps by the original authors, we further 
select a subset 5,011 genes that are highly expressed in all 3 cell types and perform addional quantile- 
normalizations for each gene in each cell type. For demonstration purpose, we map eQTLs for each 
gene separately and narrow the search for eQTLs in the cis-region (i.e. the coding region and its close 
neighborhood) of each gene (note, this is also the strategy adopted in the original publication). 

Our goals for mapping tissue-specific eQTLs are two-fold: firstly, we aim to identify eQTLs that show 
associations in at least one examined cell type (In practice, this is often treated as a hypothesis testing 
problem); secondly, we are interested in investigating the tissue-specificity of the identified eQTLs. We 
achieve both goals in a unified model selection/comparison framework by inferring a two-way table of 
genetic effects for each gene. More specifically, in each two-way table of interest, the columns represent 
different candidate cis-SNPs and the rows represent different cell types. Furthermore, the skeleton of 
each two-way table is of our primary interest. 

Because the expression levels of each gene in different cells are measured from the same set of unrelated 
individuals, the MVLR model becomes a natural choice for likelihood function. For this data set, the 
czs-region of a gene contains approximately 300 SNPs. 
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To specify the prior distribution for the two-way table of each gene, we follow the discussion in section 



3.2 and assume genetic effects for different SNPs are a priori independent. For each SNP, the possible 
qualitative association pattern with respect to a target gene in 3 cell types (i.e. the skeleton at SNP level) 
can be represented by a binary 3-vector, 7. To specify the a priori correlation structure, we consider, 
given that an eQTL is persistently active in a group of cell types, it is likely that the underlying 
regulatory mechanism is invariant. It is therefore reasonable to assume the genetic effects of the eQTL 
among this group of cell types are highly correlated. Based on these assumptions, the T g matrix of a 
two-way table containing p cis-SNPs can be precisely represented by its skeletons as 

r 9 = e? = i {ii ® 70 • 

Next, we use the scale- invariant prior formulation and apply the Bayesian meta-analysis prior proposed 



by Wen and Stephens (2011) to specify the conditional distribution A 9 | T g . For each SNP, this prior 
requires two parameters (4>, uj) to describe the genetic effects among cell types where it is assumed 
an active eQTL. In brief, parameter uj characterizes the magnitude of the average genetic effect and 
parameter <ft characterizes the heterogeneity of the effects (detailed formulation can be found in appendix 



A.3). Furthermore, instead of fixing a single set of (4>,uj) values for all eQTLs, we assume that (4>i,uj 



for SNP i, is uniformly sampled from the following set of combinations, 

L := {(^ 1) ,uj {1) ) : (0.01, 0.20), (0.01, 0.40), (0.01, 0.80), (0.01, 1.20)}, 

where the various levels of uj values intend to cover a range of potentially small, modest and large eQTL 
effects and the relative small <fi value reflects our prior belief of low heterogeneity across non-zero genetic 
effects for a given SNP-gene pair due to the hypothesis of invariant regulatory mechanism. 

Finally, we specify the prior distribution on possible skeletons (or equivalently T g ). By the a priori 
independence assumption on SNPs, for a total of p candidate cis-SNPs , we assume the prior of a given 
skeleton can be expressed as the product of the prior probabilities of its composing columns, i.e., 

p 

Pr(T fl ) = IJPr(7i)- (5-1) 

i=l 

We further treat all SNPs exchangeable and among 2 3 possible configurations of 7^, we assign the most 
probability mass to the null configuration by specifying Pr (7^ = (000)) = 0.99. Our intention here is to 
encourage overall sparse skeletons across cis-SNPs, which is motivated by the fact that vast majority 
of cis-SNPs are indeed not associated with the expression levels of a given gene. Given that a SNP is 
an eQTL, we assume the consistent configuration is mostly likely, i.e. Pr (7^ = (111)) = 0.005, and the 
rest of the tissue-specific configurations are equally likely by assigning prior probability 0.005/6 to each 
of them. 

Remark 1. It is important to point out that genome- wide expression-genotype data are typically 
informative on the distributions of configurations of 7 and effect size grids in L. In other words, those 
distributional parameters can be effectively "learned" by pooling information across all genes using 



some hierarchical model approaches (Veyrieras et al. (2008), Maranville et al. (2011)). In fact, the 



hyper-parameters we select here are closely related to the estimations from fitting such a hierarchical 
model, however these details are not our focus in this paper. 

Remark 2. Under this prior specification, the induced marginal prior on the genetic effects of a single 



SNP can be viewed as a multivariate version of "spike-and-slab" prior (Mitchell and Beauchamp (1988), 



15 



George and McCulloch (1993), Ishwaran and Rao (2005)) with the "slab" part also being a mixture of 
multivariate normals. 



Based on the complete model specification, it is straightforward to implement an MCMC algorithm 
to perform a full posterior inference on T g for each gene. However, this approach is computationally 
expensive especially when analyzing the data genome-wide. Alternatively, we have designed a greedy 
algorithm to efficiently search the posterior mode of skeletons. We give the details of this algorithm in 
appendix |FJ In brief, this algorithm performs a forward selection by starting at T g = and proposes 
to add a new candidate SNP with some non-null configuration at a time in a greedy fashion. The 
admission of a proposal depends on if the new T g achieves a higher posterior probability. Because of 
the prior computation is trivial, the essential part of the algorithm involves evaluating Bayes factors for 
each candidate T g . Since error variances are unknown, we compute ABF* to approximate BF(r^) in 
our implementation. 



We applied this algorithm to Dimas et al. (2009) data. Although the sample size is limited and our 
priors are quite stringent (for identifying eQTLs with small to modest effects), we are able to map 
375 eQTLs for the set of 5,011 genes. Among these 375 eQTLs, 291 are called tissue-consistent. We 
are also able to confidently identify a few genes with multiple cis-eQTLs that are not due to linkage 
disequilibrium (LD), suggesting the involvement of multiple regulatory elements in the transcriptional 
regulation process of the target genes. Furthermore, most of these multiple eQTLs have different pattern 
of tissue specificity, verifying the tissue-dependent nature of the gene regulation processes. 

We show such an example in Gene YBEY, where three cis-eQTLs, rs2075906, rsl23298650 and rs2839265, 
are identified from a total of 236 candidate SNPs. The selection path, the tissue activity configuration of 
each eQTL and the relevant Bayes factors are shown in Table [T] We note that the genotype correlations 
among the three called eQTLs are all very weak: the r 2 values for the pairs (rs2075906, rsl23298650) 
and (rs2075906, rs2839265) are close to 0, and the r 2 value between rsl23298650 and rs2839265 is merely 
0.06. To further examine the called tissue activity configuration, we estimate the marginal effects of each 
SNP in each individual tissue type and plot the results in Figure [TJ which confirms that our inference 
on cell- type specificity of the eQTLs is quite sensible. 



Biologically, we find that both rs2075906 and rs2839265 have been validated as eQTLs by other inde- 
pendent experiments (source: eQTL browser at http://eqtl.uchicago.edu). Further bioinformatics 
analysis (F. Luca, personal communication) reveals that the C allele of SNP rsl2329685 disrupts the 
binding of transcription factor XBP-1, a regulator of YBEY. More interestingly, XBP-1 is only activated 
by Epstein-Barr virus treatment and therefore presented only in LCL. 



6 Conclusion and Discussion 



In this paper, we have systematically studied the problem of statistical inference on unobserved multiway 
tables, we have shown that multiway tables are naturally formed in many commonly used linear model 
systems and given a generalization in the SSMR model. We have demonstrated the importance of 
hierarchical modeling and Bayesian prior specification in defining structures of multiway tables. We 
have also derived analytic expressions of exact and approximate Bayes factors for multiway tables 
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2.700 


consistent 
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Not Added 
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rs2839265 


2.693 


Fibroblast only 


10.485 


Added 


10 


rs2839309 


2.611 


LCL and T-cell only 


10.525 


Not Added 



Table 1: The selection path generated by the greedy search algorithm running on gene YBEY. ABF* s i ng i e 
is computed assuming that the candidate SNP is the sole cis-eQTL and under its best configuration, 
while ABF*(r £J ) assumes a skeleton including the SNPs already selected up to the point along with 
the candidate SNP in consideration. Some of the candidate SNPs are highly correlated with some 
existing SNP in the selection set (e.g. rs2839156, rs2839182), thus not added; Others (e.g. rs2096507, 
rsl029262), although less correlated with the existing set, do not provide strong enough "additional" 
effects to overcome the prior "penalty" . 
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Figure 1: : Examining single SNP effect of each called eQTL in each separate cell type for Gene 
YBEY. For each SNP in each cell type, we obtain the estimates of the effect size and its standard 
error by fitting a single linear regression model. The estimated effect sizes and their corresponding 95% 
confidence intervals are plotted for different cell types in a forest plot for each SNP. SNP rs2839265 has 
shorter intervals, because its minor allele frequency (0.28) is higher than the frequencies of rs2075906 
(0.09) and rsl2329865 (0.11) 



based on the SSMR model and a general multivariate normal prior assumption. These results provide 
us a set of statistical instruments to perform efficient Bayesian model comparison and selection for 
multiway tables. Finally, we have illustrated our statistical approach in mapping multiple potentially 
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tissue-specific eQTLs. 



Albeit the generality of our Bayes factors for arbitrary positive smei-definite W g matrices, we have placed 
a great deal of emphasis on context-dependent considerations for W g and its hyper-prior distributions. 
In our Bayesian hierarchical models, some important and context-specific properties of multiway tables 
are articulated through prior specifications. Take sparsity consideration as an example and consider a 
two-way table in a meta-analysis of multiple variant genetic association studies: it is quite reasonable to 
assume that only a few genetic variants are associated with the given phenotype; however, conditional 
on a particular variant is genuinely associated, it is most natural to expect its effects are presented 
in most studies. Thus, a general (non-structured) sparse assumption on the two-way table becomes 
inappropriate in this specific settings. 

We note in literature several Lasso-type of penalized likelihood methods have been proposed in estimat- 
ing two-way tables of regression coefficients based on models similar to multivariate linear regressions 



(Rothman et al. (2010), Yin and Li (2011)). We see our approach differs from these methods in three 



major aspects: firstly and most importantly, as discussed above, our approach requires more context- 
specific consideration in prior specification than generic sparsity assumption; secondly, our approach 
separates inferences of skeleton and estimation of regression coefficients, which makes results concep- 
tually easy to interpret; finally, our approach allows correlated predictors in regression models while 
under the similar scenario Lasso's performance would be affected without adjustments. Nonetheless, we 
acknowledge that our primary interest in this paper is not prediction, for which Lasso-type approaches 
can be extremely useful. 

The general results of approximate Bayes factors are derived through asymptotic approximations. As 
we have shown, their accuracies ultimately depend on whether RMLE/MLE-type estimators yield rea- 
sonably good estimates for unknown residual error variances. When sample sizes are limited and the 
number of response variables, r, or the number of predictors, p, is modestly large, it is likely that the 
proposed approximations become inaccurate. This scenario also implies that observed data are not 
sufficiently informative for I] and the good inference would inevitably rely on utilization of potential 
prior information. In the future work, to improve the performance of approximate Bayes factors in 
small sample situations, it is important to stress both of the following directions: firstly, improve the 
accuracy of Laplace approximation when likelihood surfaces are flat (some simple numerical strategies 



proposed in Wen and Stephens (2011) are also applicable here); secondly, construct context-dependent 



meaningful prior distributions for efficient estimation/integration of XI. 

It is worth pointing out that although we have exclusively focused on demonstrations in genetic and 
genomic applications in this article, our results can be applied in other areas of statistical sciences. 
The inference and model selection problems in structural equation models (SEM) have been extensively 



studied (Raftery (1993), Dunson et al. (2007)). In particular, a Gaussian directed acyclic graph (DAG) 



can be represented by a set of linear structural equations. There, the edge set of the graph is typically of 
interest for inference and is represented by a matrix of regression coefficients. Also linear model systems 
are also widely used in spatial-temporal modelings, where we also see multiway tables naturally emerge. 
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A Specification of W g in Genetic Applications 

In this section, we give detailed arguments to sequentially decompose W g into a block diagonal structure 
for a three-way table in genetic applications. We emphasize that this particular decomposition should 
serve as an illustration rather than a general guideline for dealing with genetic applications: especially 
when additional information becomes available, we can always improve upon it: see |Veyrieras et al. 



(2008), Stingo et al. (2011) for discussions. 



A.l Prior Decomposition by Genetic Variants 



Guan and Stephens (2011) argues that in genetic applications regression coefficients of genetic effects 
reflect the "causal" effects on the phenotype of interest and there is no obvious reasons to suspect those 
causal effects among different variants are correlated spatially (note, it is important to distinguish the 
correlation of observed genotypes and the independence of true genetic effects). Therefore, suppose 
there are p genetic variants in consideration, we assume their genetic effects are a priori independent. 
Consequently, W g can be decomposed as 

w 9 = $ie$ 2 e---e$p. (a.i) 

The i-th. block matrix $j are specific to the z-th genetic variants, describing its prior genetic effects with 
different phenotypes in different subgroups. In particular, <J?j = indicates that the i-th variant is not 
associated with any phenotype in any subgroup. 

A. 2 Priors for Multiple Quantitative Traits Associations 

Once W g is decomposed into the level of single genetic variants, further decomposition can be usually 
achieved in considering their associations with multiple phenotypes. 

The interplays between genetic variants and multiple phenotypes are complicated in nature: not only 
genetic variants can directly affect multiple phenotypes, there also exist interactions between phenotypes 



through gene networks. Recently, Stephens (2010) proposes a directed acyclic graph (DAG) approach 
to describe this complex system. Essentially, it first classifies phenotypes into clusters of affected and 
unaffected by a target genetic variant and decompose $j as follows: 

$i = 6ii e • • • e e ir , (A.2) 

where each block matrix 0^ characterizes the genetic effects of the i-th variant with respect to phenotype 
j. If a phenotype j is assumed unaffected, we set ®ij = 0; otherwise, the non-zero genetic effects 
are assumed a priori independent. Combining with a multivariate linear regression model, such prior 
specification leads to analyze associations of affected phenotypes with a target genetic variant conditional 
on unaffected phenotypes. 

A. 3 Priors for Heterogeneous Effects in Subgroups 

Finally, in the dimension of subgroups, the goal is to define heterogeneity of genetic effects of a specific 
variant with multiple phenotypes in different subgroups. 
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This problem has been considered by Wen and Stephens (2011) in Bayesian framework. One of the 



priors they have proposed has the following form to describe the genetic effect of the i-th variant with 
respect to the j-th phenotype in the k-th subgroup: 



Pg,ijk ~ N (A;,[ij] A\ 



(A.3) 



where scalar /9 a m denotes the "average" genetic effect of the variant-phenotype pair across all subgroups 
considered. Furthermore, 



P gM ~N(0,4). 



(A.4) 



In this formulation, parameter (f>ij characterizes the heterogeneity, and u>ij describes the magnitude of 
prior expected average genetic effect. This prior specification consequently implies that if genetic effects 
are assumed to present in all subgroups, Qij has the following structure (by integrating out P g Mj]) 



e, 



V 



^+4 



(A.5) 



The values of 



depend on the contexts of the applications. In a meta-analysis, heterogeneity 



of genuine associations might be assumed small, which leads to specifying small values of In the 
extreme case of a fixed effect model, <fiij is assumed to be exactly (and the resulting Ojj is singular). 
However, in detecting GxE interactions, heterogeneity of a true signal is expected to be large and 
large value of 4>ij should be considered. The detailed discussion on this topic can be found inlWen and 



Stephens (2011). This framework also allows to specify genetic effects in certain subgroups are exactly 
0, which becomes important in detecting qualitative GxE interaction (e.g. mapping tissue specific 
eQTLs, detailed in application section). 

Each block matrix Ojj also induces an indicator s-vector, denoting the non-zero associations in all 
the subgroups considered for the i-th variant and the j-th phenotype. Consequently, the resulting T g 
for the full multiway table can be mathematically represented by 



r„ 



ef =1 (®; =1 (7,, Tij)) 



(A.6) 



B Bayes factor Derivation 

In this section, we show the derivation of Bayes factors based on the SSMR model. 

Recall notations introduced in section [2] in the SSMR model: Y := {Yi, . . . , Y s }, X := {X\, . . . , X s } = 
{(X c> i, Xg ; x), . . . , (X C)S , X g>s )} and S := {£1, . . . , We further denote the complete collection of 
regression coefficients by B := {B\, . . . , B s } . 

The likelihood function is given by 

t n ■ s ( 1 s \ 

p(Y\X, B, S) = (2 7 r)- !Lj f 1 ^ • H |E<|-^ • etr -- ^ ^(Y - X^)'^ - X^) (B.l) 

i=l V i=l J 
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where function etr(-) denotes exponential of the trace. Given the least square estimate B{ for each 
composing MVLR, it follows that 

(Yi - XiBiYiY - XiBi) = (Yi - XiBiYiY - XiBi) + (B { - B-f (X^)^ - B t ). (B.2) 



Note this decompositio n holds even if Xi is rank-deficient (however, Bi may not be unique, see McCul 



lagh and Nelder (1989), page 82 for discussions). We further denote /3j := vec(-B^) and i := vec(i^), 



and use /3 aU and /3 aU to denote the sequentially concatenated vectors of (f3 1 , . . . ,(3 S ) and (/3 1 , . . . ,(3 S ) 



respectively. The likelihood function (B.l) can be re-written as 



p(Y\X, B, E) =(2tt) — • J] |E f |~r • etr -- ^\Y t - X^'iY - X^) 

i=l V i=l / 



(B.3) 



where 

$ = (8) s; 1 ) e • • • e (x' s x s ® s; 1 ) . 

Also, by the general case of Gauss-Markov theorem, we note that Var(/3 a n) = (In case that $ is 
singular, the Moore-Penrose pseudoinverse is applied). 

Although f3 sys and /3 all generally differ in the order of the elements, they can be reconciled by a permu- 
tation operation, i.e., 

= Asys' (B.4) 

where P is a (rps + r x (rps + r q^) permutation matrix. Furthermore, we denote 

and it can be shown that 

Var(/3 sys ) = J!" 1 . (B.5) 

As a result, 



p(Y\X, (3 sysl S) =(2tt)-t -ni^r^-etr Er 1 ^ - X^'PS - J 



' ex P ( n ( fisys flsys J ^ ( /^sys /^sys 



(B.6) 



B.l Bayes Factor for Known £ 

With E known, the marginal likelihood p(Y\X , E) can be evaluated analytically, i.e., 

p(F|X,E) = | p(Y|X,E,^ sy >(^ sys )d/3 sys . (B.7) 
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Recall the prior distribution 
/3 sys ~N(0,* c + Wg. 
Assuming W g is full-rank, the integration yields 



r Zjj n i -w—r n i 1 1 1 1 1 

p(Y|X,E) =(2vr) 2— . JJ — ^ • \W g \-2 • |* c |-3 • Ifi + fr- 1 © W- 1 ]-* 

i 

■ ex P (-\0' sys n {n- 1 - (n + *- x © ^/O (b.j 



etr " X&yVi ~ X iBi) \ , 



To further simplify (B.8), we decompose f2 into the following block matrix 



where O c and f2 s match the the dimensions of matrices \£ c and W g respectively. By (B.5), it follows 
that 

v- 1 = n g -n' f n- 1 n f . (b.9) 

Let 

u = n g - n' f (n c + ^c 1 ) -1 ^/ + w- 1 , 

and it follows that 

\n + y- 1 © w~ x \ = |n c + ^j 1 ! • |w|. (b.io) 

Furthermore, the result of the matirx product Q (O -1 — (O + ^~ x © W^" 1 ) -1 ) ^ can be represented by 

, , , • / A B \ , 
a block matrix I , I , where 

A = o c [J- (n c + ttj 1 )-^] - [i - ^c(^ c + K x y x ] n f u-% [i - (n c + tf- 1 ) -1 ^] , 

5= [j-^ c (fi c + ^- 1 )- 1 ] fyW^W- 1 

D = W" 1 - W~ X U~ X W~ X = {U- Wg 1 ) -{U- W~ X )U~ X {JA - Wg 1 ). 

Although the expressions are fairly complicated, when the limit VP" 1 — > is taken, A — > and B — > 0. 

The exact same calculation can be carried out for the null model. In the end, we obtain the following 
marginal likelihood 

P(Y\X, S, Hq) = (2vr) ^ • |Ei|-r • |^ c |~2 • |O c + ^J 1 ^ 

i 

• ex P f-^fe [n- 1 - (o c + m/- 1 )- 1 ) n c p c )j (b.h) 
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where (3 C is the sys estimated under the null model (i.e. restricting f3 = 0), similarly, B{ is the 
corresponding Bi estimated under the null model for the ith MVLR model. Note the relationship of 
the least square estimates between the full model and the null model: 

Bi = B C:i + (X'^Xcj^X'cjXgjBgj, (B.12) 

and 

ft - X^BiYiY - X Cyl Bi) - (Y - XiBiYiYt - XiBi) 

= B 'g,i { X 'g,i X g,i ~ X 'g,i X c,i( X 'c,i X c,'i)~ X 'c,i X g,i) B g,i 

It follows that 



(B.13) 



(B.14) 



This also gives the explicit expression for V g , i.e., 

V- 1 = ®U [( X 'g,i X 9,i ~ X ii X cA X 'c,i X c,ir 1X c,i X 9 ,i) ® S- 1 ] • (B.15) 

Finally, by taking the limit Vl/J 1 — > and noting 

lim U = V~ 1 + W- 1 , (B.16) 
Vc -1 ->o 

we obtain the final result 

BF(W g ) = li + ^W-.r^-exp [Wgil + Vg^Wg)- 1 ] V' 1 ^ . (B.17) 

B.2 Approximate Bayes Factors for Unknown £ 

When S is unknown, we assign independent inverse Wishart priors, YW s (Hi,mi) to each Sj and ad- 
ditional integrations are required to compute the marginal likelihood. More specifically, the goal is to 
evaluate 

p(Y\X) = J p(Y\X,V)Hp(^) d^ 1 ...dZ;\ (B.18) 

i 

where 

^(S- 1 ) cx |Sr 1 | !! ^etr {- l - Hl ^ . (B.19) 

The desired Bayes factor is therefore computed as 

BF(W Q ) = lim / i y 1 - >VUFK 1 \ x —, s —r. B.20 

9 *^ojp(Y\X,'£,H )Yl i p(X- 1 )d^ 1 ---dX7 1 
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By plugging in (B.8) and (B.ll) and noting the cancellation of |^ c | 2 terms along with the fact that 



is positive definite, it is easy to see that the remaining integrands, both are functions of 9~ , are 
bounded. It is then justified by bounded convergence theorem (BCT) to switch limit and integration 
operations. As a result, we obtain 



BF(W g 



fK Ha dXi 1 ---dXj 1 



where 



K Ha = \I + V^ x W g \~* ■ exp Q/? fl [V^Wgil + V^Wgr'V- 1 - Vg 1 ] A?) 

. f[ lErif^^- 1 • etr (-\ £ ^ (#« + (15 - XA)'(^ - X^)) ) 



(B.21) 



(B.22) 



2 • etr 



i=l 



It is also important to note that because of (B.14), (B.22) can be alternatively represented as 



k h . =\i + Vf^-i ■ exp (\0 t [v-Hr,([ + v-Hv^X' 1 ] A 



U^ 1 



n i+ m i-1j- r - 1 

2 



etr --^ Sr 1 fa + (Y - X^'iY - X cA B) 



i=l 



i=l 



Similarly, (B.23) can also be equivalently represented by 



j ! n i+ m i-1i- r - 1 



^ =exp(--/3 9 y- 1 /3 9 )-niS- 

i=i 

■ etr ^-i ^ S7 1 (Hi + (Yi - X^'iY - X^))! 



(B.23) 



(B.24) 



(B.25) 



Because V^ -1 and potentially Wg are both functions of X, the analytic integration of .K# a is generally 
implausible. Here we approximate both Ku a and -fCf/ by Laplace's method. Note, although the analytic 
integration of i^^ is straightforward, it is been shown (Wen and Stephens ( 2011[ )) that simultaneously 
applying Laplace's methods to both Kjj a and Kh achieves better numerical accuracy for desired Bayes 
factor. 

Laplace's method approximates an integral with respect to a d X d symmetric matrix Z (or equivalently 
the corresponding half-vectorized (d+ l)d/2 dimensional vector vech(Z)) in the following way, 



J h(Z)exp{g(Z)) dZ ^ {2Tr) d ^ d+1 ^ i \H^~ 1 / 2 h{Z)exp(g{Z) 



(B.26) 
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where 



Z = argmjix g(Z), 

and \Hg\ is the absolute value of the determinant of the Hessian matrix of the function g evaluated at 
Z. The technical requirements on the factorization are that h(-) is smooth and positively valued and 
g(-) is smooth and obtains its unique maximum in the interior of D. Although different factorization 
schemes generally achieve different approximation accuracy, the asymptotic error bounds are typically 



the same (for detailed disscusion, see Butler (2007) chapter 2). 



To evaluate the desired Bayes factor, we sequentially apply Laplace's method with respect to each E^ 1 
for both Kn a and Kjj . In what follows, we show the detailed derivations for ABF(W g ) and ABF*(W a ) 
using two different factorization schemes for Kjj a . 

B.2.1 Derivation of ABF 

The first application of Laplace's method attempts to expand the integrand at the well-known MLE of 
E^ 1 - In particular, we sequentially apply this strategy for each We start by factoring Kjj a into 

K Ha = / la (S7 1 )exp( 5a (Sr 1 )), (B.27) 

where 

MSr 1 ) = l J + Vg^Wg]-* ■ exp Q# [Vg^Wgil + V g - l W g )- l V~ l - V- 1 ] /O • |Sr i r i "T r " 1 

s , 1 / 1 s \ 

• II ISJ 1 !^^ etri ,zZ {Hj + (Xs - XjBjyiYj - X jBj )) 

3=2 \ 3=2 ' ) 



(B.28) 



and 



ga^i 1 ) = y log l^r 1 ! " ^race (s^ 1 (#i + (*i " - XiBi))) . (B.29) 

This factorization is likely to yield an accurate approximation when m — > oo, because by law of large 
number 

lim —(Y x - XiBxym - XiBx) = Si. (B.30) 

?ii-s>oo m 

This also gives error bound of the approximation O(^). Note that we can factorize 

|E^| 2 = I Ei X | a • IS^" 1 1 — ^ , 

for some arbitrary k (and here we use k = 0). As long as n > fc, the error bound of the approximation 
does not change. 
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It is straightforward to show that the unique maximum of g(E 1 ) is attained at 
E x = — (h x + (Y l - X 1 B 1 )'(Y 1 - XtB^) . 



(B.31) 



Because H\ is positive definite and {Y\ — XiB\)'{Yi — X\Bi) is at least positive semi-definite, Ei is 
invertible. Thus, 



To compute the Hessian matrix H ga (T,^ 1 ), we follow (Minka (2000)) and it can be shown that 

d 2 g a 



dvech(S 1 1 )dvech(S 1 
D' s (Ei®Ei)I> a 



iv 



(B.32) 



(B.33) 



where D s denotes the duplication matrix for s x s symmetric matrices. When the Hessian is evaluated 
at E^ -1 , its absolute determinant results in the following simple form, 



2~ r n 



r r( r +l)/2|f, , r +l 



(B.34) 



Similarly, we expand Kjj around Ei based on expression (B.25), which results in the following decom- 
position, 



K Ho = M s i 1 )exp(fif (S 1 x ) ), 



(B.35) 



where 



and 



Msr 1 ) = n i s 7 x i — s — • etr -2 E S 7 X + - ^'to - ^)) 

i=2 \ j=2 ^ / 

• exp(--/3' 9 ^" 1 /3 9 ) • lEr 1 ! * 



^(Er 1 ) = y logjE^ 1 ! - -trace (s^ 1 + (Y 1 - X 1 B 1 )'(Y 1 - X x B lt 



(B.36) 



(B.37) 



Noting that go(Ti 1 1 ) and <7i(S 1 1 ) are identical, Ei also uniquely maximizes go(^i )■ 



Sequentially, we apply the similar factorization and maximization procedure to all composing Ej's and 
we obtain the following approximation, 

s 



BF(W g ) =\I + Vg~ l W g \~^ • exp ( \w g {I + V^ 1 ^)" 1 ] V g X $ g ) • J] ( 1 + °(^) ) (B-38) 



1 

Tli 



where V g 1 and W g are the corresponding V g 1 and Wg evaluated at (Ei, . . . , E s ). In particular, 



K,i^3,i ~ X 'g,i X c,i{X' c i X^i) 1 X' ci Xg ii ) <g) 1 



This leads to the final expression of ABF 



ABF(W S ) = \I+ V^W^ • exp ( \$ g V- x [w g (I + f^ 1 ^)" 1 ] 



(B.39) 
(B.40) 
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B.2.2 Proof of the Asymptotic Consistency of ABF 



Proof. Under the condition stated in proposition 1, it follows that 

Sj -4' Ej, for i = 1, . . . , s 
It then follows from the continuous mapping theorem that 

ABF(V^) a 4-BF(Hg, 
in which BF(W 9 ) is evaluated using the true Ej's. 



□ 



B.2.3 Derivation of ABF* 



The second application of Laplace's method expands Kjj a around the MLE of E^ 1 under the null model. 



Based on (B.24), we factorize Ku a into, 

Msr 1 ) = \i+ Vg-'Wgr 1 * ■exp(^$' g v- 1 w g (i + v- 1 w g )- 1 v- 1 9 ) ■ l^r 1 !™ 1 "^ 

S rij -\-rrij — q„- — r— 1 / 1 ^ / 

■ II l E 7 I 2 • etr "2 £ E 7 + ^ - X *i B i)'(rj - X cjB 

3=2 \ 

and 



3=2 



(B.41) 



^(S- 1 ) = ^ log - -trace (E7/ 1 (#i + (Y 1 - X cA B 1 )'(Y 1 - X cA B ± ) 



(B.42) 



Next, we factorize Kh in such a way that g a (^i 1 ) = 5o(E 1 i ), and 



/ l0 (sr 1 )=ni s 7 1 

3=2 



n j+ 7n j~ c lj — *~ — 1 



3=2 



(B.43) 



This factorization achieves the desired error bound because of (B.13) and the the following limiting 
condition 



1 



lim — {X' iXg i - X' iXciiX'iX^i) X' ci X g i) — Qi. 



rii->oo n 



It follows that the unique maximum for go and g\ is attained at 
El = — (h x + (Y 1 - X c ^B l )'{Y 1 - X c +B{) 



(B.44) 



(B.45) 



The remaining calculation is similar to what we have shown in the previous section, and the resulting 
approximation is given by 



BF(iy g ) = \I+ V^Wg]- 1 * • exp ( \p' g V- x \w g (I + V g l W g y l \ V~ 1 $ g ) • ]J ( 1 + O(-J-) ) , (B.46) 



i=l 



1 

m 
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where V g 1 and W g are the corresponding V 1 and W g evaluated at (Si, . . . , S s ). In particular, 



V~ 



{ X g,i X 9,i ~ X 'g,i X c,i{ X 'c,i X c,i) 1X 'c,i X g,i) ® S j 1 



Thus, we have obtained an approximation to the desired Bayes factor, 

ABF*(Wg = \I + V g - x W 9 \-\ • exp (\$ a V g l [w g (I + V^Wg)' 1 ] V' 1 ^ 



Note the relationship, 

vec(B^) = ({X' gii X g>i - X'^X^X^X^X^X^y 1 X' gji i) vec[(y - X c ,^)'], 
and let 

k g = ®u (x g ,®t- 1 ). 

We then can represent ASF*{W g ) using only the estimates from fitting the null model, i.e., 



ABF*(W s ) = |J + V g - 1 W g |-2.exp l-0 c k g W g (I + V^W,)- 1 K g P c , 



(B.47) 



(B.48) 



(B.49) 



(B.50) 



C Computational Stability of Bayes Factors 



In this section, we show the computational stability of the derived Bayes Factors. The stability issue 
may arise in practice when highly correlated predictors are included in the models (where V g might 
become singular). We show that the derived Bayes Factors and their approximations can still be stably 
evaluated when the design matrix is or close to rank deficient. 



First, we denote 

Gi= (i — X C;i (X' c i X C:i ) X' c ^j X, 



9,n 



(CI) 



and its p x m Moore- Penrose pseudo inverse matrix by Gf. By the general least square theory, it can 
be shown (regardless if Gi is full-rank) that 



B. 



9^ 



and 



^ = vec(^) = (G+0/)vec(y/) 



It is then follows from the general property of Moore-Penrose pseudo inverse, such that 

V 9 ~iKi = [(G'iGiGt) S- 1 ] vec(Y/) 
= (G^S- 1 )vec(y/) 
= vec(S~ 1 y/G i ). 



(C.2) 
(C.3) 
(C.4) 

(C.5) 
(C.6) 
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Finally, V g 1 (3 g is computed by sequentially concatenating V gi f3 gi for i = l, ...,s. 
Assuming X' c ^X c ^ can be inverted in the general sense, we note that in this formulation 

1. There is no matrix inversion directly involving X' gi X gj i, which might be rank deficient due to 
highly correlated predictor variables co-exist in X g . 

2. There is no matrix inversion directly involving W g . 

3. Matrix (/ + V^Wg) is guaranteed positive definite. 

If SI is unknown and some design matrix Xi is rank deficient, to obtain Sj and evaluate ABF, it becomes 
inevitable to carry out Moore-Penrose inverse for X[Xi in computing (Yi — XiBi)' (Yi —XiBi). However, 
in case of ABF*, only X Cj i is required for computing Sj, which is assumed full-rank. 



D Computing Bayes Factors with Singular W„ 



We first give the proof for proposition 2 with unknown XI. 



Proof. By definition (4.9), when W g is singular and residual error variance £ is unknown, the desired 



Bayes Factor is computed by 

hm A ^ / K Ha {Wl{\)) d^ 1 ... dZj 1 



BF(PF„ 



/ K Ho dX- 1 ...dZ7 1 



(D.l) 



where the integrands Kn a and Kh are defined in (B.22) and (B.23) respectively. It is clear that 



K Ha (W}W)<tl 



8=1 



is; 



• etr ( --E7 1 ( Hi + (Yi - X^'fr - X^) 



fD.2l 



Because the RHS is clearly integrable with respect to S 1 , 
(BCT), 



Tj s , by bounded convergence theorem 



BF(iy g 



It follows that 



j\\m x ^K Ha {Wl(\))d^ 1 ...dY,- 1 
jK Ho dZ- 1 ...dZs 1 



(D.3) 



hmK Ha (W}(\))=K Ha (W g ), 
A— >0 



(D.4) 



because Ku a does not involve direct inverse of W g and matrix sum (I + V g W g ) is guaranteed full rank. 
Therefore, Bayes factor 



BF(W g 



fK Ha (W g )d^\..d^ 



9 ' fK Ho d^ 1 ...dZs 1 
exists and well-defined for singular W g . 



(D.5) 
□ 
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In applying Laplace's method to approximate (D.5), we note that the factorization shown in appendix 



B.2.1 is still mathematical valid. However, its accuracy is likely worse, because the linear constraints 
imposed by the singularity of W g is ignored: under the linear restrictions, the integrand (with respect 
to S) is peaked around the constrained estimate of S rather than the unconstrained one. Hence, we 
should modify the factorization based on (B.24) to reflect this fact: 



, S; 1 ) =\I + V~ x W g \-\ • exp Q/3' 9 V^W g (! + V^Wg)-^- 1 ^ 



(D.6) 



and 



-1 



i=i 



11 



1 log I E7 1 1 



1 



trace ( Sr 1 ( H { + (Y { - X l B r i )'{Y i - Xtf] 



(D.7) 



where B\ is the least square estimates of Bi subject to the linear constraints and g is the corresponding 
vectorize d estim ates. The remaining arguments for Laplace's method is the same as we have shown in 
appendix 



B.2.1 



1 



however Sj is now taking the following form: 



Hi + (Y { - XiBD'iYi - X { B\ 



(D. 



which takes the linear constraints into account. 

The derivation for ABF* is invariant under this setting, and the result remains the same. 



E Connections with Test Statistics and BIC 



E.l ABF* and Rao's Score Statistic 



Following Chen (1983), it is straightforward to derive Rao's score statistic, T score for testing Hq : j3 g = 
based on the SSMR model. More specifically, 



(E.l) 



T score = J^vec^ - XciBi)']' (X' gti X g ,i ® S" 1 ) vec[(Y t - X^Bi)'} 
i=i 

Given that the prior specification W g = cV g and W g is assumed full-rank, it is easy to show that 

PgV- 1 \W g (l + V g - l W g r l ] 



9 9 

9 9 M 9 



1 + C 

C 



U (x' i X gti ®Zri 



Pc- 
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This yields, 

ABF* (W g ) 



rps 



1 + C 



ex.]) I 2 * ]^ q -^score 



E.2 Connections with BIC 



Under the conditions that 



1. V g and W g are full-rank, 

2. hm n ^ ™ = 0,Vi, 

3. rii » p, r, s, Vi, 



We show that BIC can be derived from the Bayes factor and its approximations based on the SSMR 
model. 

For each consisting multivariate linear regression in MVLR, independent subjects are sampled from a 
population. Under this setting, it is commonly assumed that 



1 



lim — {X' gi Xg i - X' iXciiX'iXci) X' ci X g i) — Qi, 



and Qi is also full-rank. Hence, 

1 



lim V a 

raj— >oo 



n. 



{Qi 1 ® 



(E.2) 



(E.3) 



When S is known, as m — > oo for each i, based on (E.3) 



lim (I + V g - 1 Wg)=V g - 1 W g , 

raj— >oo,V« a M 



and 



Note that 



i=l 



and the likelihood ratio 

p(y|X,B,E) 



p(Y|X,B,E,#o) 



exp Q^^Ar) 



(E.4) 



(E.5) 



(E.6) 



(E.7) 
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Finally, we obtain 



s / s s \ 

log BF(W g ) « (log Lt - log Lq) - V - ^logni + I - J>g \Qi\ - | £ log |E<| - - log \W g \\ 

i=l \ i=l i=l / m j 

= (log Li - logL ) - ^ ^logn, + 0(1), 

i=l 

which is essentially the BIC. 



It is important to note that if V g or W g is singular, (E.4) and the rest of the derivation, will not be 
valid. This is understandable intuitively: the singularity reflects linear constraints on parameter space, 
and the number of "free" parameters is no longer trivial to count and BIC may not be well defined. 
Also, if W g is also a function of sample sizes, then the condition 2 might be violated and the resulting 
error term may not be of the order 0(1). Similarly, condition 3 is also important to ensure the desired 
error bound for BIC. 

When S is unknown, in a similar way, it can be shown that 

logABF(Wg) « l$' g V- l g ~ y J>gn, + ( £ J>g|Qi| " f E lo Sl^l " ^ lo gl^l ) » ( E - 9 ) 

i=l V i=l i=l / 

and 

logABF*(Wg « i^F-i^-^^logn^ (^log|Q<| ~ f^logl^l - \log\W g \ J . (E.10) 

i=l \ i=l i=l ~ / 

Asymptotically, 

lim $' g V- l g ->• /3 9 y-^ 9 . (E.ll) 

Furthermore, it can be shown that 

lim E i = E i + Si i Q < fl 1 , jij (E.12) 

and 

B' V~ X Q 

lim " 9 9 Pg = C, (E.13) 
m-now $ g V g - l p g 

where C is a constant. Thus, 

^y- 1 ^ = + 0(1). (E.14) 

This yields our final results: under the conditions stated 

s 

logABF(Wg) = (log Li - logL ) - ^ ^logn; + 0(1), (E.15) 

8=1 



and 

logABF*^) = (log L x - logLo) - y J^logn, + 0(1). (E.16) 



pr 



2 

i=l 
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F A Greedy Algorithm for Searching Posterior Mode of T g 

In this section, we give the detailed general descriptions of a greedy algorithm for searching the posterior 
mode of a two-way table. 

We first define posterior score 

S P ost(r 5 ) :=log 10 Pr(r 9 ) + log 10 BF(r 9 ). (F.l) 
It should be clear that exp(5 , pos t) oc Pr(T 9 |y, G). 

The algorithm begins by evaluating all r 9 's containing exactly one SNP and constructing a priority list 
I: 

1. For SNP i = l to p: 

(a) For subgroup configuration j = 1 to s: 

i. Construct T g ^j by a single SNP i with configuration j, compute S pos t(T g ^j). 

(b) Insert SNP i with the best configuration j = argmax^ S post (r ffi jfc) to the priority list I to 
represent SNP i. 

2. Sort priority list I in a descending order according to the posterior scores of all SNPs. 

3. Starting with T g = and S pos t(T g ) = 0, go through the list I in the following fashion 

(a) Construct a new skeleton, T' g , by adding a new SNP along with its best configuration to 
current T g , compute 5 P ost(r^). 

(b) If Sp OS t(r' g ) > S P ost(Tg), set T g = T' g , i.e. permanently add the new SNP to the selected set. 

(c) Otherwise, leave T g unchanged. 

4- Report final T g . 

Note that in step 3(b), T g and T' g differs at only one column. Suppose this column has configuration j, 
by our prior specification on T g , 

log 10 Pr(r;) - log 10 Pr(r 9 ) = log 10 7T + log(Aj) - log 10 (l - tt). (F.2) 

It then becomes equivalent that 

S P ost(r g ) > S post (T g ) o 
log 10 BF(r^) - log 10 BF(r 9 ) > log 10 (l - tt) - log 10 (7r) - log(A,). 

This indicates the selection is based on thresholding the Bayes Factors and the threshold is determined 
by the prior distributions. 
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